11 research outputs found

    Adquisición y representación del conocimiento mediante procesamiento del lenguaje natural

    Get PDF
    [Resumen] Este trabajo introduce un marco para la recuperación de información combinando el procesamiento del lenguaje natural y conocimiento de un dominio, abordando la totalidad del proceso de creación, gestión e interrogación de una colección documental. La perspectiva empleada integra automáticamente conocimiento lingüístico en un modelo formal de representación semántica, directamente manejable por el sistema. Ello permite la construcción de algoritmos que simplifican las tareas de mantenimiento, proporcionan un acceso más flexible al usuario no especializado, y eliminan componentes subjetivas que lleven a comportamientos difícilmente predecibles. La adquisición de conocimientos lingüísticos parte de un análisis de dependencias basado en un formalismo gramatical suavemente dependiente del contexto. Conjugamos de este modo eficacia computacional y potencia expresiva. La interpretación formal de la semántica descansa en la noción de grafo conceptual, sirviendo de base para la representación de la colección y para las consultas que la interrogan. En este contexto, la propuesta resuelve la generación automática de estas representaciones a partir del conocimiento lingüístico adquirido de los textos y constituyen el punto de partida para su indexación. Luego, se utilizan operaciones sobre grafos así como el principio de proyección y generalización para calcular y ordenar las respuestas, de tal manera que se considere la imprecisión intrínseca y el carácter incompleto de la recuperación. Además, el aspecto visual de los grafos permiten la construcción de interfaces de usuario amigables, conciliando precisión e intuición en su gestión. En este punto, la propuesta también engloba un marco de pruebas formales.[Resumo] Este traballo introduce un marco para a recuperación de información combinando procesamento da linguaxe natural e o coñecemento dun dominio, abordando a totalidade do proceso de creación, xestión e interrogación dunha colección documental. A perspectiva empregada integra automáticamente coñecementos lingüísticos nun modelo formal de representación semántica, directamente manexable polo sistema. Isto permite a construción de algoritmos que simplifican as tarefas de mantemento, proporcionan un acceso máis flexible ao usuario non especializado, e eliminan compoñentes subxectivos que levan a comportamentos difícilmente predicibles. A adquisición de coñecementos lingüísticos parte duhna análise de dependencias basada nun formalismo gramatical suavemente dependente do contexto. Conxugamos deste modo eficacia computacional e potencia expresiva. A interpretación formal da semántica descansa na noción de grafo conceptual, servindo de base para a representación da colección e para as consultas que a interrogan. Neste contexto, a proposta resolve a xeración automática destas representacións a partires do coñecemento lingüístico adquirido dos textos e constitúe o punto de partida para a súa indexación. Logo, empréganse operacións sobre grafos así como o principio de proxección e xeneralización para calcular e ordenar as respostas, de tal maneira que se considere a imprecisión intrínseca e o carácter incompleto da recuperación. Ademáis, o aspecto visual dos grafos permiten a construción de interfaces de usuario amigables, conciliando precisión e intuición na súa xestión. Neste punto, a proposta tamén engloba un marco de probas formais.[Abstract] This thesis introduces a framework for information retrieval combining natural language processing and a domain knowledge, dealing with the whole process of creation, management and interrogation of a documental collection. The perspective used integrates automatically linguistic knowledge in a formal model of semantic representation directly manageable by the system. This allows the construction of algorithms that simplify maintenance tasks, provide more flexible access to non-specialist user, and eliminate subjective components that lead to hardly predictable behavior. The linguistic knowledge adquisition starts from a dependency parse based on a midly context-sensitive grammatical formalism. In this way, we combine computational efficiency and expressive power. The formal interpretation of the semantics is based on the notion of conceptual graph, providing a basis for the representation of the collection and for queries that interrogate. In this context, the proposal addresses the automatic generation of these representations from linguistic knowledge acquired from texts and constitute the starting point for indexing. Then operations on graphs are used and the principle of projection and generalization to calculate and manage replies, so that is considered the inherent inaccuracy and incompleteness of the recovery. In addition, the visual aspect of graphs allow the construction of user-friendly interfaces, balancing precision and intuition in management. At this point, the proposal also includes a framework for formal testing

    A library for automatic natural language generation of Spanish texts

    Get PDF
    In this article we present a novel system for natural language generation (nlg) of Spanish sentences from a minimum set of meaningful words (such as nouns, verbs and adjectives) which, unlike other state-of-the-art solutions, performs the nlg task in a fully automatic way, exploiting both knowledge-based and statistical approaches. Relying on its linguistic knowledge of vocabulary and grammar, the system is able to generate complete, coherent and correctly spelled sentences from the main word sets presented by the user. The system, which was designed to be integrable, portable and efficient, can be easily adapted to other languages by design and can feasibly be integrated in a wide range of digital devices. During its development we also created a supplementary lexicon for Spanish, aLexiS, with wide coverage and high precision, as well as syntactic trees from a freely available definite-clause grammar. The resulting nlg library has been evaluated both automatically and manually (annotation). The system can potentially be used in different application domains such as augmentative communication and automatic generation of administrative reports or news.Xunta de Galicia | Ref. ED341D R2016/012Xunta de Galicia | Ref. GRC 2014/046Ministerio de Economía, Industria y Competitividad | Ref. TEC2016-76465-C2-2-

    A System for Automatic English Text Expansion

    Get PDF
    This work was supported in part by the Mineco, Spain, under Grant TEC2016-76465-C2-2-R, in part by the Xunta de Galicia, Spain, under Grant GRC-2018/53 and Grant ED341D R2016/012, and in part by the University of Vigo Travel Grant to visit the CLAN Research Group, University of Aberdeen, U.K.Peer reviewedPublisher PD

    A system for automatic English text expansion

    Get PDF
    We present an automatic text expansion system to generate English sentences, which performs automatic Natural Language Generation (NLG) by combining linguistic rules with statistical approaches. Here, “automatic” means that the system can generate coherent and correct sentences from a minimum set of words. From its inception, the design is modular and adaptable to other languages. This adaptability is one of its greatest advantages. For English, we have created the highly precise aLexiE lexicon with wide coverage, which represents a contribution on its own. We have evaluated the resulting NLG library in an Augmentative and Alternative Communication (AAC) proof of concept, both directly (by regenerating corpus sentences) and manually (from annotations) using a popular corpus in the NLG field. We performed a second analysis by comparing the quality of text expansion in English to Spanish, using an ad-hoc Spanish-English parallel corpus. The system might also be applied to other domains such as report and news generation.Ministerio de Economía, Industria y Competitividad | Ref. TEC2016-76465-C2-2-RXunta de Galicia | Ref. GRC-2018/53Xunta de Galicia | Ref. ED341D R2016/012University of Aberdee

    Métodos y técnicas de monitoreo y predicción temprana en los escenarios de riesgos socionaturales

    Get PDF
    Esta obra concentra los métodos y las técnicas fundamentales para el seguimiento y monitoreo de las dinámicas de los escenarios de riesgos socionaturales (geológicos e hidrometeorológicos) y tiene como objetivo general orientar, apoyar y acompañar a los directivos y operativos de protección civil en aterrizar las acciones y políticas públicas enfocadas a la gestión del riesgo local de desastre

    De la adquisición del conocimiento a la recuperación de información

    Get PDF
    Introducimos una propuesta en recuperación de información basada en la consideración de recursos sintácticos y semánticos complejos y automáticamente generados a partir de la propia colección documental. Se describe una estrategia donde el lenguaje y el dominio de documentos son independientes del proceso.We introduce a proposal on information recovery based on the consideration of complex syntactic and semantic resources which are automatically generated from the documentary collection itself. The paper describes a strategy where the language and the domain of documents are independent of the process.Work partially supported by the Spanish Government from research projects TIN2004-07246- C03-01 and HUM2007-66607-C04-02, and by the Autonomous Government of Galicia from projects PGIDIT05PXIC30501PN, 07SIN005206PR and the Galician Network for NLP and IR

    Identifying banking transaction descriptions via support vector machine short-text classification based on a specialized labelled corpus

    Get PDF
    Short texts are omnipresent in real-time news, social network commentaries, etc. Traditional text representation methods have been successfully applied to self-contained documents of medium size. However, information in short texts is often insufficient, due, for example, to the use of mnemonics, which makes them hard to classify. Therefore, the particularities of specific domains must be exploited. In this article we describe a novel system that combines Natural Language Processing techniques with Machine Learning algorithms to classify banking transaction descriptions for personal finance management, a problem that was not previously considered in the literature. We trained and tested that system on a labelled dataset with real customer transactions that will be available to other researchers on request. Motivated by existing solutions in spam detection, we also propose a short text similarity detector to reduce training set size based on the Jaccard distance. Experimental results with a two-stage classifier combining this detector with a SVM indicate a high accuracy in comparison with alternative approaches, taking into account complexity and computing time. Finally, we present a use case with a personal finance application, CoinScrap, which is available at "Google Play" and "App Store".Ministerio de Economía, Industria y Competitividad | Ref. TEC2016-76465-C2-2-RXunta de Galicia | Ref. GRC2018/053Xunta de Galicia | Ref. ED341D-R2016/01